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ABSTRACT 



Researchers investigated the optimum length of teacher 
inservice activities where increasing teacher efficacy was the goal. 
Participants were elementary science teachers from seven teacher enhancement 
projects conducted from 1992-99. The length breakdown of each program was: 
1992 - - 6 weeks; 1994--6 weeks; 1995--4 weeks; 1996--4 weeks; 1997--4 weeks; 
1998 - - 3 weeks; and 1999--2 weeks. In each of the projects, teachers completed 
the Science Teaching Efficacy Belief Instrument (STEBI) on the first and last 
days of the inservice workshops. The STEBI examined personal science teaching 
efficacy (PSTE) and science teaching outcome expectancy. Data analysis 
indicated that there was no statistically significant difference between the 
mean PSTE gain scores on the three contrast variables among the four groups 
of teachers whose PSTE pretest scores were greater than or equal to 50. This 
may be due to the fact that teachers already scored high on the PSTE scale. 
Among teachers whose pretest PSTE scores were less than 50, there were 
significant gains when comparing the mean gain scores from teachers in the 2- 
and 3 -week sessions and teachers in the 4- and 6 -week sessions. It was found 
that inservice intervention programs had the greatest impact on the efficacy 
of those teachers who began the program with the lowest efficacy. Given the 
consistent relationships demonstrated between teacher efficacy and positive 
student outcomes, inservices impacting teachers' low efficacy are worth close 
examination. (Contains 32 references and 8 tables.) (SM) 
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Abstract 

Utilizing the data collected from the National Science Foundation, National Institute of Health, 
and Eisenhower funded teacher enhancement projects, this paper also will present results on the 
effectiveness of differing lengths of inservice activities in raising teachers’ self-efficacy. 
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An examination of change in teacher self-efficacy beliefs in science education based on the 
duration of inservice activities 

Albert Bandura (1977a, 1997) presented self-efficacy as a mechanism of behavioral 
change and self-regulation in his social cognitive theory. An efficacy belief is one’s perceived 
ability to carry out actions that will lead successfully toward a specific goal. Bandura proposed 
that efficacy beliefs were powerful predictors of behavior since they were ultimately self-referent 
in nature and directed toward specific tasks. The predictive power of efficacy beliefs has been 
borne out in the research (Bandura, 1997; Pajares, 1996; Tschannen-Moran, Woolfolk, Hoy, & 
Hoy, 1998). 

The recognition and measurement of self-efficacy is especially important to researchers 
of the social sciences. Bandura (1982) noted that highly efficacious people tend to show higher 
levels of effort and are resilient in continuing this effort, even in the face of adverse situations. 

As a result, recognizing and increasing a person’s self-efficacy could eventually lead them to 
work harder and in worse conditions than their counterparts with lower self-efficacy. 

When Bandura first published his work on efficacy in 1977, he hypothesized for the 
social psychologist that there were two dimensions from which efficacy springs: self-efficacy 
and outcome expectancy. Bandura defined self-efficacy as “the conviction that one can 
successfully execute the behavior required to produce the outcomes” (1977b,j>. 79), and 
outcome expectancy as “a person’s estimate that a given behavior will lead to certain outcomes” 
(1977b, p. 79). 

Many researchers have applied Bandura’s social cognitive theory concepts to teachers. 
Among the first of the researchers were Ashton and Webb (1982). Ashton and Webb argued that 
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two items previously used by RAND researchers (Armor et al., 1976; Berman, McLaughlin, 
Bass, Pauly, & Zellman, 1977) to study teacher efficacy actually corresponded to Bandura’s self- 
efficacy and outcome expectancy dimensions of social cognitive theory. These two dimensions 
have subsequently been identified as personal teaching efficacy and general ( or outcome) 
teaching efficacy , respectively. In generalizing these two educational constructs, Schriver and 
Czemiak (1999) said that “self-efficacy has generally been defined as the belief that one’s 
teaching ability is related to positive changes in students’ behaviors and achievement levels, and 
outcome expectancy is the belief that any teacher, in spite of all other factors, can affect student 
learning” (p. 23). To further the study of teacher efficacy, Gibson and Dembo (1984) developed 
the Teacher Efficacy Scale (TES) to measure both of these constructs. The TES was the first 
attempt to develop an empirical data collection instrument to tap into this potentially powerful 
variable in teachers. 

Teacher efficacy is a context and even subject-matter specific construct. A teacher may 
feel very confident in his or her ability to impact student learning while teaching mathematics, 
but quite inefficacious while teaching social studies. Accordingly, some researchers have 
modified the TES and developed subject matter-specific instruments. Riggs and Enochs (1990), 
for example, have developed the Science Teaching Efficacy Belief Instrument, or STEBI, and 
the Microcomputer Utilization in Teaching Efficacy Beliefs Instrument, or MUTEBI (Enochs, 
Riggs, & Ellis, 1993). Based on the TES, the STEBI and MUTEBI also consjst of two 
dimensions, called personal science teaching efficacy (PSTE) and science teaching outcome 
expectancy (STOE), which are believed to correspond with Bandura’s self-efficacy and outcome 
expectancy constructs. 

PSTE scores have been positively related to teaching performance (Riggs et al., 1994), 
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teachers’ reported enjoyment of science-related activities, and teachers’ ratings of the personal 
relevance of science (Watters & Ginns, 1995). Riggs and Jesunathadas (1993) found that 
teachers high in PSTE were more likely to spend the time needed to develop a science concept in 
class. Teachers scoring low in PSTE were reported as spending less time teaching science, rated 
weak by observers, and less likely to choose to teach science (Riggs, 1995). Teachers’ scoring 
low on the STEBI STOE scale were rated as less effective in science teaching (Enochs, 
Scharmann, & Riggs, 1995). These teachers often used more text-based, rather than activity- 
based, instruction and employed less cooperative learning (Riggs, 1995). 

Although many efforts have been made to increase the level of teachers’ efficacy, and 
many studies have monitored change in efficacy during the course of an inservice or other 
training program, little research has been done to monitor the optimum length of these programs 
with respect to raising teacher efficacy. The purpose of the present paper is to provide a 
framework for understanding the optimum length of teacher inservice activities when increasing 
teacher efficacy is a goal of the intervention. 

Data collection 

More than 330 teachers were involved in the collection of data process. These teachers 
were drawn from a cohort gathered through seven National Science Foundation (NSF), National 
Institute of Health, and Eisenhower funded teacher enhancement projects. Inservice programs 
were conducted in years 92 through 99. The length breakdown of each program is as follows: 

1 992 - 6 weeks; 1 994 - 6 weeks; 1 995 - 4 weeks; 1996-4 weeks; 1 997 - 4 weeks; 1998-3 
weeks; and 1999 - 2 weeks. In each of these inservice projects, the STEBI was given in a 
pretest/posttest fashion on the first and last day of the workshops. It is understood that the 
differing functions and effectiveness of the inservice activities will have higher loadings on 
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change in personal efficacy scores. However, this study has the advantage of analyzing data on 
the STEBI from a number of inservice programs conducted by the same principal investigators. 

One point of concern, also noted by Ross (1994), is the difficulty in bringing about 
changes in personal teacher efficacy through a staff development program. He and Little (1984) 
addressed this problem by involving teachers in a more interactive inservice that included 
teacher practice. The present study used a similar approach. While the length of each inservice 
differed (between two and six weeks), the purpose and content of each remained the same: to 
develop inquiry-based science skill and content knowledge among existing elementary teachers 
through hands on experiences and interaction with experienced master teachers and scientists. 

The groups of teachers also were relatively homogeneous, although the number of years 
of teaching experience differed. All participants in the summer training programs were 
elementary school science teachers in the Houston area. Although researchers are relatively 
certain that teaching experience ranged from 1 to 25 years, more specific information was not 
available for some of the cohorts because of the archival aspect of some of the datasets. As a 
result, only STEBI scores and lengths of interventions could be used in this analysis. The 
uncertain consistency and availability of other types of demographic data made it impossible to 
include those factors in the analysis at this time. 

The Outcome Expectancy Scale of the STEBI 

Once data from the seven different measurement occasions were collected, reliability 
estimates were conducted to confirm the data used for analysis in this report.' (The correlation 
matrix for the data analyzed in this paper is presented in Table 1.) The first step was to perform 
a confirmatory factor analysis (CFA) using the items from the STEBI to model a two-factor 
solution (PSTE and STOE). This analysis was performed with AMOS 4.0. 
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Results obtained demonstrated the model fit to the data was not very strong. Table 2 
illustrates the findings from the CFA. The fit statistics from the CFA seem to indicate that there 
are some problems with either the data or the model design. 



Insert Tables 1 and 2 about here 



When these problems were noted, an exploratory factor analysis was performed to 
determine if the items actually were being allowed to load on the right factors. When the 
exploratory two-factor solution was run with the data, all items were placed in the factors that 
Riggs and Enochs (1990) had originally defined. Although the two-factor solution confirmed the 
loadings of the items into the two originally hypothesized factors, it was noted that this solution 
only accounted for 38.5% of the variance. While the two-factor solution is very parsimonious, it 
brings to question the reliability of a solution that cannot explain more than 60% of the overall 
variance. However, even the seven-factor solution explains only 60% of the variance. Stevens 
(1996) states that, as a general rule of thumb, someone would want the factors extracted to 
account for at least 70% of the variance. 

The question that arises is whether or not the instrument produces reliable data and if that 
data is appropriate to use for the purposes of monitoring teacher efficacy. Further analyses 
performed on the STEBI data showed that most of the items that loaded on the first factor (in the 
two-factor solution) continued to load on that factor when a four and five-factor solution was 
designated. The items defining that first factor make up the personal science teaching efficacy 
(PSTE) scale of the STEBI. 
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Insert Tables 3 and 4 about here 



Other researchers also have noted the problems associated with the outcome expectancy 
scale (STOE) of the STEBI. In particular, Tschannen-Moran et al. (1998) have argued that this 
dimension is a measure of external locus of control, as opposed to outcome expectancy. Several 
researchers support this conclusion (Guskey & Passaro, 1994; Coladarci & Fink, 1995). Given 
that the STOE scale of the STEBI was modeled after the TES, then the STOE scale also likely 
evaluates external locus of control. With the possible exception of the article by Schriver and 
Czemiak (1999), few research projects have noted differences in the outcome expectancy 
dimension of the STEBI (c.f. Cannon & Scharmann, 1994). For this reason, only the PSTE scale 
of the STEBI was used when performing analyses for this paper. 

Data analysis 

When first exploring the data, it seemed there was a relatively small difference between 
the cohorts in the four different lengths of inservice programs (2, 3, 4, and 6 weeks). Upon 
closer examination, however, it seemedihat there was a ceiling effect among the people who 
scored high on the PSTE scale of the STEBI pretest. As a result, efforts were made to identify 
teachers who scored low on the PSTE scale pretest and a criterion was set that teachers scoring 
below the mode score (50) were separated from the rest of the dataset to be used in further 

rv: 

analyses. These teachers were chosen not only because of their low score, biit also because they 
had more potential for improvement than their counterparts. The data in Table 5 appear to 
validate this decision. 




9 



Efficacy 9 



Insert Table 5 about here 



Another concern of the analysis was the use of gain scores. Although Huck and McLean 
(1975) suggest using an ANCOVA type design over the use of gain scores, they do provide 
estimations of gain score reliability for when a gain score method is needed instead of ANCOVA 
designs. By obtaining the average reliability between the pre and posttest and the correlation 
between the two tests, one is able to determine the gain score reliability. The computation stems 
from the fact that “as the correlation between pre and posttest scores approaches the reliability of 
the test, the reliability of the difference scores goes to 0” (Stevens, 1996, p. 328). Using Huck 
and McLean’s estimation procedure, we were able to determine that the gain score reliability is 
.67 based on an average reliability (alpha) of .7717 and a correlation of .372. 

After results from data in Table 5 had been consulted, it was decided that a planned 
contrast design should be used instead omnibus hypothesis testing because of the relative 
strength of interpretation of results when compared with omnibus hypothesis testing (Hinkle, 
Wiersma, & Jurs, 1998). The contrasts tested in the ANOVA are listed in Table 6. These 
contrast variables were then used in a regression equation (the planned contrast ANOVA) to 
predict the gain scores in the PSTE scale among the teachers scoring below 50 on the pretest. 



Insert Table 6 about here 



Analyses also were conducted with data obtained from the teachers who scored above 50 
on the PSTE scale of the STEBI pretest. The same contrasts were used when examining the 
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differences between the scores of the different lengths of inservice experiences among this 
group. 

Results 

The purpose of this analysis was largely experimental. Based on the data available, 
researchers were interested in the optimum length for an inservice activity that had as a target 
increasing teacher self-efficacy. The first area of interest involved the increase in efficacy of 
teachers who originally scored below 50 on the PSTE scale pretest of the STEBI. When these 
data were analyzed with a planned contrast analysis, it was noted that differences between mean 
PSTE gain scores among teachers in the 2-week and 3-week programs and differences between 
mean PSTE gain scores among teachers in the 4-week and 6-week inservice programs were not 
statistically significant. However, when the mean PSTE gain scores of the teachers in the 2-and 
3-week programs were contrasted against the mean PSTE gain scores of the teachers in the 4-and 
6-week programs, statistically significant results were found, thus rejecting the null hypotheses 
that the mean gain scores of these two groups were the same. Results from this first analysis can 
be found in Table 7. 



Insert Tables 7 and 8 about here 



The same contrasts then were carried out with the gain scores from teachers whose score 
was greater than or equal to 50 on the PSTE scale of the STEBI pretest. This analysis produced 
no statistically significant results among the teachers’ mean PSTE gain scores in the first contrast 
(2-week and 3-week vs. 4- week and 6-week), the second contrast (2 -week vs. 3-week), and the 
third contrast (4 week vs. 6-week). Therefore, we failed to reject the null hypothesis that there 
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was no difference in teachers’ mean PSTE gain scores for the three contrast variables. 

Discussion 

The first discovery of note, which came at no surprise, was that there was no statistically 
significant difference between the mean PSTE gain scores on the three contrast variables among 
the four groups of teachers whose score on the PSTE scale pretest was greater than or equal to 
50. This outcome might be interpreted as a result of the fact that teachers already scored high on 
the PSTE scale. Therefore, there was not a lot of room for improvement or mean PSTE gain 
score increase. These results would be expected from any study where a ceiling effect occurred. 
They also seem to correspond with the current literature showing the difficulty in raising the self- 
efficacy of teachers who already have high levels of personal self-efficacy or who are 
experienced teachers (cf. Anderson, Greene, Loewen, 1988; Ohmart, 1992). Since self-efficacy 
is formed at least partially from one’s experiences, as teachers move into their career, their 
efficacy beliefs tend to become less malleable. 

The second outcome, which probably is of more practical importance, is the result from 
the analysis involving teachers whose score on the PSTE scale pretest was less than 50. This 
group provided the most room for growth in self-efficacy, and is exactly the group that many 
teacher inservices target for improvement. From Table 7, we can extrapolate that statistically, in 
terms of mean gain scores on the PSTE scale, there is no difference between a 2-week and a 3- 
week training session, nor is there a statistical difference between a 4-week and 6-week session. 
The benefit in this area is largely in terms of cost. Suppose an administrator were faced with the 
decision of sending his/her teachers to either a 2 or 3-week inservice program. All other factors 
being equal (e.g., quality of presentation and amount of material covered), the results of this 
study show that teachers’ efficacy will be raised about the same in either program. If one of the 
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goals of sending teachers to the inservice were to raise their self-efficacy, and if cost were not an 
issue, then the administrator would be able to save money and send the teachers to the 2-week 
program, rather than the 3-week program. Likewise, in terms of self-efficacy, administrators 
would do just as well to send their teachers to a 4-week inservice, rather than a 6-week inservice 
(all other things being equal). 

There was, however, a statistically significant difference on the PSTE scale when 
comparing the mean gain scores from teachers in the 2-and 3 -week sessions and teachers in the 
4-and 6- week sessions. The results from this contrast variable in Table 7 have interesting 
consequences. For the administrator or program designer, they tend to suggest that a 4-week 
inservice is probably the best use of resources if the goal of the program is to raise teachers’ self- 
efficacy and money is not an issue. 

Conclusion 

While the results from the first contrast variable in Table 7 are statistically significant, it 
should be noted that this contrast only has an R 2 of .038 and an adjusted R 2 of .033. Although 
Cohen would categorize this effect size as small, it still seems to be resilient when accounting for 
sampling error, as reflected in the lack of shrinkage in the adjusted R 2 . Because the effect size is 
small, researchers are cautioned from interpreting results as pillars for how long an inservice 
should be. In fact, this small effect size demonstrates the need for further research in this area. 
Future research should include not only teachers in the primary grades, but also in the secondary 
grades, and should include other measures of teacher expertise, such as teaching experience and 
previous training. 

Despite the small effect size, this paper can begin the process of providing information 
about the relative cost-effectiveness of inservice programs designed to increase the self-efficacy 
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of teacher participants. The results of the present study are compelling because the inservice 
interventions had the greatest impact on the efficacy of those teachers who began the program 
with the lowest efficacy beliefs. Given the consistent relationships between teacher efficacy and 
positive student outcomes and teaching behaviors (see e.g., Anderson et al., 1988; Coladarci, 
1992; Gibson & Dembo, 1984; Moore & Esselman, 1992; Podell & Soodak, 1993; Soodak & 
Podell, 1993), inservices that can impact the low efficacy in individual teachers are worth close 
examination. 
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Table 2 

Results from the CFA of the STEBI data 



Fit Measure 


Value 


Chi Square 


625.749 


CFI 


.855 


PCFI 


.781 


NFI 


.771 


GFI 


.863 


RMSEA 


.062 


AGFI 


.838 
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Table 3 



2-factor solution pattern matrix and structure matrix of STEBI data 



Pattern Matrix Structure Matrix 

Factor 1 Factor 2 Factor 1 Factor 2 



1 




.514 




.539 


2 


.522 




.558 




3 


.708 




.700 




4 




.569 




.575 


5 


.640 




.636 




6 


.579 




.566 




7 




.609 




.556 


8 


.710 




.697 




9 




.664 




.666 


10 




.433 




.418 


11 




.656 




.654 


12 


.605 




.612 




13 




.360 




.388 


14 




.634 




.626 


15 




. .725 




.713 


16 




.641 




.652 


17 


.710 




.700 




18 


.570 




.587 




19 


.675 




.651 




20 




.324 




.356 


21 


.662 




.674 




22 


.770 




.759 




23 


.615 




.648 




24 


.744 




.755 




25 




.405 




.428 



Note: Extraction method: Principal Component Analysis 
Rotation method: Promax with Kaiser Normalization 



0 

ERIC 



24 



Efficacy 22 



Table 4 



Total variance explained by factors from the STEBI data 



Factor 


Eigenvalue 


% of Variance 


Cumulative % 


1 


6.290 


25.160 


25.160 


2 


3.342 


13.368 


38.528 


3 


1.553 


6.212 


44.740 


4 


1.067 


4.268 


49.008 


5 


1.006 


4.024 


53.032 


6 


0.980 


3.918 


56.950 


7 


0.922 


3.688 


60.638 
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Table 5 

Descriptive statistics for the PSTE scale of the STEBI data 



Model 


Mean 


Median 


Mode 


STEBI pretest 


46.95 


48 


50 


STEBI posttest 


53.53 


53 


53 


Gain scores 


6.58 


6 


3 



Gain scores by length 



of intervention 
2 weeks 


5.04 


5 


6 


3 weeks 


6.24 


5 


5 


4 weeks 


7.47 


8 


3 


6 weeks 


6.47 


6 


1 



Gain scores by pretest 
scores 



Pretest <50 


2 weeks 


7.47 


7 


7 


3 weeks 


8.65 


6 


5 


4 weeks 


10.32 


11 


7 


6 weeks 


10.16 


10 


10 


Pretest >=50 


2 weeks 


2.32 


3 


4 


3 weeks 


4.20 


5 


5 


4 weeks 


2.21 


2 


0 


6 weeks 


3.54 


4 


1 
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Table 6 

Contrasts for regression of STEBI data for teachers scoring below 50 on the PSTE scale of the 
pretest. 



Session 


Test for the 
difference between 2 
week and 3 week 
session 


Test for the difference 
between 4 week and 6 
week session 


Test for the difference 
between 2 week and 3 
week vs. 4 week and 6 
week session 


2 week session (n=45) 


1.00 


0.00 


2.00 


3 week session (n=17) 


-2.65 


0.00 


2.00 


4 week session (n=107) 


0.00 


1.00 


-1.00 


6 week session (n=19) 


0.00 


-5.63 


-1.00 



BEST COPY AVAILABLE 
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Table 7 

Planned Contrast ANOVA for the PSTE scale of the STEBI data for teachers scoring below 50 



on the pretest. 



Model 


SS 


df 


MS 


F 


Sig 


R 2 


2 & 3 weeks vs. 4 & 6 weeks 


260.399 


1 


260.399 


7.375 


.007 


.038 (.033)* 


2 weeks vs. 3 weeks 


17.192 


1 


17.192 


0.487 


.492 


.003 (-.003)* 


4 weeks vs. 6 weeks 


0.412 


1 


0.412 


0.012 


.915 


.000 (-.005)* 


(Subtotal) 


278.004 


3 


92.668 


2.625 


.052 


.041 (.025)* 


Error 


6496.805 


184 


35.309 








Total 


6774.809 


187 











Note: * Adjusted R 2 in parenthesis 
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Table 8 

Planned Contrast ANOVA for the PSTE scale of the STEBI data for teachers whose scores were 



greater than or equal to 50 on the pretest. 



Model 


SS 


df 


MS 


F 


Sig 


R 2 


2 & 3 weeks vs. 4 & 6 weeks 


4.304 


1 


4.304 


.148 


.701 


.001 (-.006)* 


2 weeks vs. 3 weeks 


49.175 


1 


49.175 


1.711 


.193 


.012 (.005)* 


4 weeks vs. 6 weeks 


24.777 


1 


24.777 


.857 


.356 


.006 (-.001)* 


(Subtotal) 


81.423 


3 


27.141 


.938 


.424 


.020 (-. 001 )* 


Error 


3991.451 


138 


28.924 








Total 


4072.873 


141 











Note: * Adjusted R 2 in parenthesis 
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